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KERNEL ESTIMATORS OF ASYMPTOTIC VARIANCE 
FOR ADAPTIVE MARKOV CHAIN MONTE CARLO 1 

By Yves F. Atchade 

University of Michigan 

We study the asymptotic behavior of kernel estimators of asymp- 
totic variances (or long-run variances) for a class of adaptive Markov 
chains. The convergence is studied both in L p and almost surely. The 
results also apply to Markov chains and improve on the existing lit- 
erature by imposing weaker conditions. We illustrate the results with 
applications to the GARCH(1, 1) Markov model and to an adaptive 
MCMC algorithm for Bayesian logistic regression. 

1. Introduction. Adaptive Markov chain Monte Carlo (adaptive MCMC) 
provides a flexible framework for optimizing MCMC samplers on the fly (see, 
e.g., [3, 8, 27] and the reference therein). If tt is the probability measure of 
interest, then these adaptive MCMC samplers generate random processes 
{X n ,n>0} that typically are not Markov, but they nevertheless satisfy 
a law of large numbers and the empirical average n _1 X^fc=i h(Xk) provides 

a consistent estimate of the integral vr(/i) = f E(/i(X)), X ~ -k. A measure of 
uncertainty in approximating ir(h) by the random variable n _1 X^fc=i M-^fc) 
is given by the variance Var(n~ 1 / 2 X^/c=i h(Xk)). In particular, the asymp- 
totic variance (J^{fi) — lini^—^oo Var(n x l 2 Y^,=\ M-^fc)) (^o known as the 
long-run variance) plays a fundamental role in assessing the performances 
of Monte Carlo simulations. But the problem of estimating asymptotic vari- 
ances for adaptive MCMC samplers has not been addressed in the literature. 

We study kernel estimators of asymptotic variances for a general class of 
adaptive Markov chains. These adaptive Markov chains (the precise defini- 
tion is given in Section 2 below), which include Markov chains, constitute 
a theoretical framework for analyzing adaptive MCMC algorithms. More 



Received November 2009; revised April 2010. 
Supported in part by NSF Grant DMS-09-06631. 
AMS 2000 subject classifications. 60J10, 60C05. 

Key words and phrases. Adaptive Markov chain Monte Carlo, kernel estimators of 
asymptotic variance. 

This is an electronic reprint of the original article published by the 

Institute of Mathematical Statistics in The Annals of Statistics, 

2011, Vol. 39, No. 2, 990-1011. This reprint differs from the original in pagination 

and typographic detail. 

1 



2 



Y. F. ATCHADE 



precisely, if {X n ,n > 0} is an adaptive Markov chain and h : X — > R a func- 
tion of interest, then we consider estimators of the form 

n 

rl(h) = £ w(kbh n (k), 

k=—n 

where 7 n (fc) = 7n(fc; h) is the kth order sample autocovariance of {h(X n ),n > 
0}, w : R — 7- R is a kernel with support [—1,1] and b = b n is the bandwidth. 
These are well-known methods pioneered by M. S. Bartlett, M. Rosenblatt, 
E. Parzen and others (see, e.g., [26] for more details). But, with a few no- 
table exceptions in the econometrics literature (see references below), these 
estimators have mostly been studied with the assumption of stationarity. 
Thus, more broadly, this paper contributes to the literature on the behav- 
ior of kernel estimators of asymptotic variances for ergodic nonstationary 
processes. 

It turns out that, in general, the asymptotic variance cr 2 (h) does not 
characterize the limiting distribution of n -1 / 2 X/fe=i(M-^fc) ~ 7T (h)) as, for 
example, with ergodic Markov chains. For adaptive Markov chains, we show 
that n~ 1 / 2 ^fc=i(M^"fc) ~~ 7r (^ 1 )) converges weakly to a mixture of normal 
distributions of the form \JY 2 (K)Z for some mixing random variable T 2 (h), 
where Z is a standard normal random variable independent of T 2 (h). Under 
a geometric drift stability condition on the adaptive Markov chain and some 
verifiable conditions on the kernel w and the bandwidth b n , we prove that 
the kernel estimator r 2 (/i) converges to V 2 {h) in L p -norm, p > 1, and almost 
surely. For Markov chains, T 2 (/i) coincides with cr 2 (/i), the asymptotic vari- 
ance of h. Another important special case where we have T 2 (h) = cr 2 (h) is 
the one where the adaptation parameter converges to a deterministic limit 
as, for instance, with the adaptive Metropolis algorithm of [17]. The general 
case where T 2 (/i) is random poses some new difficulties to Monte Carlo error 
assessment in adaptive MCMC that we discuss in Section 4.3. 

We derive the rate of convergence for T 2 (/i), which suggests selecting 
the bandwidth to be b n oc ^wW-O-SnVp)) . When p = 2 is admissible, we 
obtain the bandwidth b n oc ra" 1 / 3 , as in [16]. 

The problem of estimating asymptotic variances is well known in MCMC 
and Monte Carlo simulation in general. Besides the estimator described 
above, several other methods have been proposed, including batch means, 
overlapping batch means and regenerative simulation ([12, 13, 16, 24]). For 
the asymptotics of kernel estimators, the important work of [16] proves the 
L 2 -consistency and strong consistency of kernel estimators for Markov chains 
under the assumption of geometric ergodicity and E(|/i(X)| 4+£ ) < oo, X ~ tt, 
for some e > 0. We weaken these moment conditions to E(|/i(X)| 2+e ) < oo. 

Estimating asymptotic variances is also a well-known problem in econo- 
metrics and time series modeling. For example, if f3 n is the ordinary least- 
squares estimator of /3 in the simple linear model yi = a + j3%i +U{,i = 
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1, .. . ,n, where {uk, k > 1} is a dependent noise process, then, under some 
mild conditions on the sequence {x{\ and on the noise process, y/n(/3 n — (3) 
converges weakly to a normal distribution Af(0, a 2 /c 2 ), where 

a 2 = lim Var I n -1 / 2 N ^ I , c 2 = lim ?7 _1 \ ^(xj — x n ) 2 , i„ = ?7 _1 \ a;*,. 

n— >-oo \ ^ — ' / n— >oc ^ — ' ^ — ' 

v fc=l 7 fc=l fc=l 

Therefore, a valid inference on /3 requires the estimation of the asymptotic 
variance a 2 . The multivariate version of this problem involves estimating 
the so-called heteroskedasticity and autocorrelation (HAC) matrices. Sev- 
eral authors have studied the kernel estimation of HAC matrices and at- 
tention has been paid to nonstationarity under various mixing assumptions 
or mixingale-type assumptions ([1, 14, 15, 19]). But these results require 
mixing conditions that do not hold in the present setup. 

On a more technical note, the proof of our main results (Theorems 4.1— 
4.3) is based on a martingale approximation approach adapted from [29]. 
The crux of the argument consists in approximating the periodogram of 
the adaptive Markov chain by a quadratic form of a martingale difference 
process which is then treated as a martingale array. As part of the proof, 
we develop a strong law of large numbers for martingale arrays which may 
also be of some independent interest. The approach taken here thus differs 
from the almost sure strong approximation approach taken in [13, 16]. 

The paper is organized as follows. In Section 2, we define the class of 
adaptive Markov chains that will be studied. In Section 3, we give a general 
central limit theorem for adaptive Markov chains that sets the stage to bet- 
ter understand the limiting behavior of the kernel estimator T^ih). In Sec- 
tion 4, we state the assumptions and the main results of the paper. We also 
discuss some practical implications of these theoretical results. The proofs 
are postponed to Section 6 and to the supplementary paper [5]. Section 5 
presents applications to generalized autoregressive conditional heteroscedas- 
tic (GARCH) processes and to a Bayesian analysis of logistic regression. 

We end this introduction with some general notation that will be used 
throughout the paper. For a Markov kernel Q on a measurable space (y,A), 
say, we denote by Q n , n > 0, its nth iterate. Any such Markov kernel Q 
acts both on bounded measurable functions / and on cr-finite measures 

fi, as in Qf(-) = fQ(;dy)f(y) and fiQ(-) = J fi(dx)Q(x, ■). If W -.y -> 
[l,+oo) is a function, then the VF-norm of a function f :y — >R is defined 

as \f\w = f supy |/|/W. The set of measurable functions f m .y — >• M with fi- 
nite VK-norm is denoted by Cw- Similarly, if \i is a signed measure on 

(y,A), then the VF-norm of u is defined as \\fJ-\\w = sw P{g ,\g\ w <i\ IMff)!) 

where n(g) = f J g(y)/i(dy). If v is a cr-finite measure on (y,A) and q > 1, 
we denote by L q {u) the space of all measurable functions f:(y,A) — > M 
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such that v(\f\ Q ) < oo. Finally, for a, b G R, we define a Ab = min(a, b) and 
a V 6 = max(a, 6). 

2. Adaptive Markov chains. Let (X, X) be a measure state space mea- 
sure space endowed with a countably generated cr-field X. Let (0, £?(©)) be 
a measure space. In practice, we will take to be a compact subspace of R g , 
the g-dimensional Euclidean space. Let {Pq,9 £ 0} be a family of Markov 
transition kernels on (X, X) such that for any (x,A) € X x X, 9 i— > Pq(x, A) 
is measurable. Let 7r be a probability measure on (X, A 7 ). We assume that 
for each 9 €Q, Pg admits 7r as its invariant distribution. 

The stochastic processes of interest in this work are defined as follows. Let 
Q = (X x ©)°° be the product space equipped with its product c-algebra T 
and let p, be a probability measure on (X x 0, X x £>(©)). Let be the prob- 
ability measure on (CI, J 7 ) with associated expectation operator E^, associ- 
ated process {(X n ,9 n ),n > 0} and associated natural filtration {J- n ,n > 0}, 
with the following properties: (Xq,9q) ~ p and, for each n > and any non- 
negative measurable function / : X — > R, 

(2.1) ^(f(X n+l )\F n ) = P e J(X n ) = f P dn (Xn,dy)f(y), P A -a.s. 

We call the X-marginal process {X n ,n > 0} an adaptive Markov chain. In 
this definition, we have left the adaptation dynamics (i.e., the conditional 
distribution of 9 n+ \ given T n and X n+ \) unspecified. This can be done in 
many different ways (see, e.g., [27]). But it is well known, as we will see 
later, that the adaptation dynamics needs to be diminishing in order for the 
adaptive Markov chain to maintain ir as its limiting distribution. 

The simplest example of an adaptive Markov chain is the case where 
9 n = 9 G for all n > 0. Then {X n , n > 0} is a Markov chain with transition 
kernel Pq. In other words, our analysis also applies to Markov chains and, 
in particular, to Markov chain Monte Carlo. 

Example 2.1. To illustrate the definitions and, later, the results, we 
present a version of the adaptive Metropolis algorithm of [17]. We take 
X = R rf equipped with its Euclidean norm and inner product, denoted by | • | 
and (•,•), respectively. Let ir be a positive, possibly unnormalized, density 
(with respect to the Lebesgue measure). We construct the parameter space 
as follows. We equip the set .M+ of all d-dimensional symmetric positive 

semidefinite matrices with the Frobenius norm \A\ = y / Tv(A T A) and inner 
product (A,B) = Tr(A T B). For r > 0, let 0+(r) be the compact subset of 
elements A G Ai + such that \A\ < r. Let M (r) be the ball centered at 

and with radius r in R d . We then define = f ©u(ri) x + (r2) for some 
constants ri,r^ > 0. 
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We introduce the functions 11^:1^ — > © M (n) and U + :M + — > 6 + (r 2 ), 
defined as follows. For v G 0^(?r), ILj('u) = v and for v ^ ©^(ri), ILj(u) = 
Similarly, for £ G 0+(r 2 ), n+(£) = S and for S £ 0+(r 2 ), 11+ (S) = 

M v 

W\ 

For = (/i, S) G 0, let -Pg be the transition kernel of the random walk 
Metropolis (RWM) algorithm with proposal kernel Af(x, 2, ^ s £ + eJ^) and 
target distribution 7r. The adaptive Metropolis algorithm works as follows. 

Algorithm 2.1. Initialization: Choose Ao G K d , (/xo,S ) G 0. Let {7™} 
be a sequence of positive numbers (we use 7 n = n~ 7 in the simulations). 
Iteration: Given (X n ,// n ,S n ): 

(1) generate ~ M(X n , ^^-T, n + el d ); with probability a n+ i = 
Y n+ x), set A n+ i = y n+ i and with probability 1 - a n+ i, set X n+1 = X n ; 

(2) set 

(2.2) fi n+1 = Il M (// n + (n + l) _1 (X n+ i - fin)), 

(2.3) S n+ i = n+(S n + (n + l)- x ((X n+1 - Mn)(A n +i - fi n ) T - £„)). 

Thus, given 7" n = a{X k , p k ,T, k ,k < n}, X n+1 ~ P# n (A n , •), where P e „ is 
the Markov kernel of the random walk Metropolis with target tt and pro- 
posal Af(x, 2 ' 3 ? £ n + eld)- So, this algorithm generates a random process 
{(X n ,9 n ),n > 0} that is an adaptive Markov chain, as defined above. Here, 
the adaptation dynamics is given by (2.2) and (2.3). 

Throughout the paper, we fix the initial measure of the process to some 
arbitrary measure ft and simply write E and P for E^ and Pn, respectively. 
We impose the following geometric ergodicity assumption. 

Al: For each 9 G 0, P$ is phi- irreducible and aperiodic with invariant 
distribution tt. There exists a measurable function V :X — > [l,oo) with 
f V(x)ft(dx,d9) < 00 such that for any (3 G (0,1], there exist p G (0,1), 
C G (0, 00) such that for any (x, 9) G X x 0, 

(2.4) \\P^x,-)-7r(-)\\ v ,<Cp n V^x), n>0. 

Furthermore, there exist constants b G (0,oo),A G (0,1) such that for any 
(x,9)£Xx 0, 

(2.5) P e V(x)<\V{x) + b. 

Condition (2.4) is a standard geometric ergodicity assumption. We impose 
(2.5) in order to control the moments of the adaptive process. Condition (2.5) 
is probably redundant since geometric ergodicity intuitively implies a drift 
behavior of the form (2.5). But this is rarely an issue because both (2.4) and 
(2.5) are implied by the following minorization and drift conditions. 
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DR: Uniformly for 9 G 0, there exist C G X, v a probability measure on 
(X,X), b,e > and A G (0,1) such that i/(C) > 0, Pe(x,-) > £f (OM^) and 

(2.6) P e V <XV + bt c . 

This assertion follows from Theorem 1.1 of [10]. DR is known to hold 
for many Markov kernels used in MCMC simulation (see, e.g., [16] for some 
references). Either drift condition (2.5) or (2.6) implies that tt(V) < oo ([22], 
Theorem 14.3.7). Therefore, under Al, if / G C v p for some (3 G [0,1], then 
/ G L 1 IP(-k). Finally, we note that under Al, a law of large numbers can be 
established for the adaptive chain (see, e.g., [7]). A short proof is provided 
here for completeness. 

To state the law of large numbers, we need the following pseudo-metric 
on 9. For (3 G [0,1], 9,9' G 9, set 

n , a \P e f(x)-P e >f(x)\ 
D/3{9,9)= sup sup — 



l/U<izex 



Vf*{x) 



Proposition 2.1. Assume Al. Let (3 G [0,1) and {hg G C v /3,9 G 6} be 
a family of functions such that 7r(hg) = 0, (x,9) — > hg(x) is measurable and 
sup 0g Q \hg\ yp <oo. Suppose also that 

(2.7) Y, k ~ 1 ( D ^ d ^i) + \h dk -h dk _ 1 \ v ,)V^X k )<oo, F-a.s. 

k>l 

Then n" 1 Y^k=i ^dk-i i^k) converges almost surely (P) to zero. 



Proof. See Section 6.1. □ 



3. A central limit theorem. Central limit theorems are useful in assessing 
Monte Carlo errors. Several papers have studied central limit theorems for 
adaptive MCMC ([2, 7, 28]). The next proposition is adapted from [6]. For 
h G Cy , we introduce the resolvent functions 

- dcf 

where Pg = Pg — tt. The dependence of gg on h is omitted for notational 

convenience. We also define Gg(x,y) = gg(y) — Pggg(x), where Pggg(x) d = 
J Pg(x,dz)gg(z). Whenever gg is well defined, it satisfies the so-called Pois- 
son equation 



(3.1) 



h(x) =gg{x) - Pggg{x). 
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Proposition 3.1. Assume Al. Let f3 e [0,1/2) and h e C v p be such 
that 7r(/i) = 0. Suppose that there exists a nonnegative random variable T 2 (h), 
finite P-a.s., such that 

1 - 

(3.2) lim - > Gl (X k _ 1 ,X k )=T 2 (h) in P -probability. 

n— >-oo n — ' 
k=l 

Suppose also that 

(3.3) ^fc" 1 / 2 jD/3 (^,^„ 1 )y/ 3 (X fc )<oo, F-a.s. 

k>l 

Then n -1 / 2 5^fc=i M-^fc) converges weakly to a random variable \fT 2 (h)Z , 
where Z ~ A/"(0, 1) is a standard normal random variable independent ofT 2 (h) 

Proof. See Section 6.2. □ 

Condition (3.3), which strengthens (2.7), is a diminishing adaptation con- 
dition and is not hard to check in general. It follows from the following 
assumption which is much easier to check in practice. 

A2: There exist rj E [0,1 /2) and a nonincreasing sequence of positive num- 
bers {7n,^ > 1}, 7n = 0(n~ a ), a > 1/2, such that for any /3 £ [0,1], there 
exists a finite constant C such that 

(3.4) D p {9 

n—l i @n ) < C'y n V r '(X n ), P-a.s. 

[2] establishes A2 for the random walk Metropolis and the independence 
sampler. A similar result is obtained for the Metropolis adjusted Langevin 
algorithm in [4] . The constant 77 in A2 reflects the additional fluctuations due 
to the adaptation. For example, for a Metropolis algorithm with adaptation 
driven by a stochastic approximation of the form 9 n+ \ = 6 n + j n H(6 n , X n+ i), 
r\ is any nonnegative number such that sup^g \H(6, -)\vn < CO- 
PROPOSITION 3.2. Under A1-A2, (3.3) holds. 

Proof. Under A2, the left-hand side of (3.3) is bounded almost surely 
by C Ylk>i ^ _1 ^ 2 7fc^ ??+/3 (A^fc), the expectation of which is bounded by the 
term C^2 k>1 k~ 1 / 2 -jk according to Lemma A. 1(a), assuming Al. Since a > 
1/2, we conclude that (3.3) holds. □ 

Equation (3.2) is also a natural assumption. Indeed, in most adaptive 
MCMC algorithms, we seek to find the "best" Markov kernel from the family 
{Pq, 6 G 8} to sample from tt. Thus, it is often the case that 6 n converges to 
some limit 9+, say (see, e.g., [2, 3, 6, 9]). In these cases, (3.2) actually holds. 
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Proposition 3.3. Assume A1-A2. Let /3e [0, (1 — 77) /2), where r] is 
as in A2, and let h G Hyp he such that ir(h) = 0. Suppose that there exists 
a Q-valued random variable 9+ such that Dp(9 n ,9+) + D2/3(9 n , 9*) converges 
in probability to zero. Then (3.2) holds. Furthermore, 



Proof. See Section 6.3. □ 

Definition 3.1. We call the random variable T 2 (/i) the asymptotic av- 
erage squared variation of h and cr 2 (h) = f E(r 2 (/i)) the asymptotic variance 



This definition is justified by the following result. 

Proposition 3.4. Assume A1-A2. Let (3 e [0, 1/2) and h e C v p be such 
that 7r(/i) = 0. Assume that (3.2) holds. Then 



Proof. See Section 6.4. □ 

4. Asymptotic variance estimation. Denote by vr n (/i) = n~ l ^fc=i h(-^k) 
the sample mean of h{Xk) and denote by ^y n (k) the sample autocovariance: 
7n(k) = for \k\ >n, j n (—k) = 7 n (&) and for < k < n, 



Let w :M. — > R be a function with support [—1, 1] [w(x) = for \x\ > 1]. 
We assume that w satisfies the following. 

A3. The function w is even [w(— x) = w(x)] and w(0) = 1. Moreover, the 
restriction w : [0, 1] — > R is twice continuously differentiable. 

Typical examples of kernels that satisfy A3 include, among others, the 
family of kernels 




of h. 




n—k 



7n(k) = ~ J2(h(Xj) - 7T n (h))(h(X j+k ) - 7T n (h)). 



(4.1) 




if |x| < 1 
if Ixl > 1 
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for q > 1. The case q = 1 corresponds to the Bartlett kernel. A3 is also 
satisfied by the Parzen kernel 

{l-6z 2 + 6|z| 3 , if|z|<i, 
2(1 -\x\) 3 , ifi<|x|<l, 
0, if|x|>l. 

Our analysis does not cover nontruncated kernels such as the quadratic 
spectral kernel. But truncated kernels have the advantage of being compu- 
tationally more efficient. 

Let {b n , n > 1} be a nonincreasing sequence of positive numbers such that 

(4.3) b~ 1 = 0{n 1 / 2 ) and \b n - b n -i\ = 0(b n n~ l ) as n -)• oo. 
We consider the class of kernel estimator of the form 

n b ~ - 1 

(4.4) T 2 n (h) = ]T w{kb n ) ln {k) = Yl w(kb n ) ln (k). 

k=-n fe=-6n 1 +l 

The following is the main L p -convergence result. 

Theorem 4.1. Assume A1-A3. Let (3 e (0, 1/2 — rf) and h G C v p , where 
rj is as in A2. Then 

1 n 

(4.5) T 2 n (h) = -Y,G 2 eki (X k _ 1 ,X k ) + Q n + D n + e n , n>\. 

n k=l 

The random process {(Q n , D n ,e n ),n > 1} is such that for any p > 1 such 
that 2p((3 + rj) < 1, there exists a finite constant C such that 

n\Qn\ P ) < C(b n + n- a b~ 1+a + „-l+(l/2)V(l/p) b -l/2)P j 

(4.6) 

E(\D n \ p )<C1% and E(\e n \ p ) <C( n - 1 b- 1 ) p . 
In particular, if lim^oo n~ 1+ ( 1 / 2 ) v ( 1 / p ^6^ 1 ^ 2 = 0, then 

r 2 n (h) - -j^Gl Jx^uXk) 

converges to zero in LP . 

Proof. The proof is given in the supplementary article [5]. □ 

Remark 4.1. In Theorem 4.1, we can always take p = l/(2(/3 + rj)) > 
1. In this case, the condition lim n _>. 00 ?7,~ 1+ ( 1 / 2 ) v ( 1 / p )& n = translates to 
0.5 V (2(/3 + 77)) + 0.55 < 1. Therefore, if /3 + rj is close to 1/2, we need to 
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choose 5 small. This remark implies that in applying the above result, one 
should always try to find the smallest possible j3 such that h G Cyp . 

It can be easily checked that the choice of bandwidth b n oc n with 5 = 
|(1 — 0.5 V (2(/3 + rj))) always satisfies Theorem 4.1. In fact, we will see in 
Section 4.2 that this choice of b n is optimal in the L p -norm, p = (2({3 + n))~ 1 . 

It is possible to investigate more carefully the rate of convergence of r,^(/i) 
in Theorem 4.1. Indeed, consider the typical case where p = 2 is admissible 
and we have a = 1. If we choose b n such that b n = o(ra -1 / 3 ) and n _1 = 
o(b n ), then the slowest term in (4.6) is n- 1+ ( 1 /2)v(i/ P ) fe -i/2 = ( n6 ^-i/ 2 _ By 
inspecting the proof of Theorem 4.1, the only term whose L p -norm enjoys 
such rate „-i+(V2)v(i/p)^V2 is 

n 

Q« = 2n- 1 ^Z«G e ._ 1 (X j _ 1 ,X j ), 

where 

j'-i 

Now, {(Qn\ J~ n ), n > 2} is a martingale array and we conjecture that as 

n — > oo, 

(nb n )^(ri(h) -^Gljx^x^ 4A/(0,A 2 ), 

at least in the special case where 6 n converges to a deterministic limit. But 
we do not pursue this further since the issue of a central limit theorem for 
r„(/i) is less relevant for Monte Carlo simulation. 

When {X n ,n > 0} is a Markov chain, Theorem 4.1 improves on [16], as it 
imposes weaker moment conditions. Almost sure convergence is often more 
desirable in Monte Carlo settings, but typically requires stronger assump- 
tions. One can impose either more restrictive growth conditions on h (which 
translates into stronger moment conditions, as in [16]) or one can impose 
stronger smoothness conditions on the function w. We prove both types of 
results. 

Theorem 4.2. Assume A1-A3 with rj < 1/4, where rj is as in A2. 
Let (3 G (0, 1/4 — rf) and h G Cyp. Suppose that b n oc n~ s , where 5 G (2(/3 + 
77), 1/2). Then 

lirn [Tl(h)--j2G 2 e k ^X^xM =0 

\ k=l / 

almost surely. 
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Proof. The proof is given in the supplementary article [5]. □ 

We can remove the growth condition h G Cyp, < (3 < 0.25 — r], and 
the constraint on b n in Theorem 4.2 if we are willing to impose a stronger 
smoothness condition onro. To do so, we replace A3 with A4. 

A4: The function w is even [w(—x) = w(x)] and w(0) = 1. Moreover, the 
restriction w : [0, 1] — > R is (r + l)-times continuously differentiable for some 
r > 2. 

Theorem 4.3. Assume A1-A2 and A4. Let f3 G (0,1/2 - rj) and h G 
C v p, where rj is as in A2. Let p > 1 be such that 2p(f3 + rj) < 1. Suppose, in 
addition, that 

£(n- 1 6- 1 ) P <°°. E(""V) 1A(p/2) <oo, 

n>l n>l 

(4.7) 

^ n - 2+ (l/2)v(l/p) 6 -l/2 <00 flnd ^ 6 (r-l)p <00 . 
n>l n>l 

T/ie conclusion of Theorem then holds. 

Proof. The proof is given in the supplementary article [5]. □ 

Remark 4.2. Not all kernels used in practice will satisfy A4. For in- 
stance, A4 holds for kernels in the family (4.1) but fails to hold for the 
Parzen kernel (4.2). 

In Theorem 4.3, we can again choose b n oc n~ s , where 5 = |(1 — 0.5 V 
(2(/3 + rj))). It is easy to check that if A4 holds with r > 1 + 2((3 + rj)5~ l and 
we take p = (2(/3 + rj)) -1 , then this choice of b n satisfies (4.7). 

In the next corollary, we consider the Markov chain case. 

Corollary 4.1. Suppose that {X n ,n> 0} is a phi-irreducible, aperi- 
odic Markov chain with transition kernel P and invariant distribution ir. 
Assume that P satisfies Al. Let f3 £ (0,1/2) and h G Cyp. Then o~ 2 (h) := 
7r(/i 2 ) + 2^2j >1 7r(hP^h) is finite. Assume A3 and take b n oc n~ s with 5 = 
|(1 -0.5 V (2/3)). Then 

lim T 2 n (h) = a 2 {h) inLW 1 . 

Supposing, in addition, that /3 G (0, 1/4) and 5 G (2/3, 1/2), or that A4 holds 
with r > 1 + 2/35" 1 , then the convergence holds almost surely (P) as well. 
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4.1. Application to the adaptive Metropolis algorithm. We shall now ap- 
ply the above result to the adaptive Metropolis algorithm described in Ex- 
ample 2.1. We continue to use the notation established in that example. We 
recall that X = R d , 9 = Q^n) x 9+(r 2 ), where 9 M (n) is the ball in X with 
center and radius r\ > and 9+(r 2 ) is the set of all symmetric positive 
semidefinite matrices A with \A\ <r 2 . Define i(x) =log7r(x). We assume 
that: 

Bl: 7r is positive and continuously differentiable, 

lim (-^-,V£{x)\ = -oo 



|a;|^oo \ \X 

and 

/ x woo \ 

11111 ( l-M , ) <0, 

\x\-*x>\\x\' \V£(x)\/ 
where VI is the gradient of I. 

Bl is known to imply Al with V(x) = (sup a , GX 7r< '( x )) 7r ^( x )> f° r an y C G 
(0, 1) ([2, 20]). We denote by //* and E* the mean and covariance matrix of 
7r, respectively. We assume that 6 9, which can always be achieved 

by taking r±,r2 large enough. 

By Lemma 12 of [2], for any ft G (0, 1], 

(4.8) D p {6 )<c\ En E n _i 

for any rj > 0. Thus, A2 holds and 7/ can be taken to be arbitrarily small. We 
can now summarize Proposition 3.1 and Theorems 4.1-4.3 for the random 
Metropolis algorithm. We focus here on the choice of bandwidth b n oc n~ s , 
where S = |(1 — 0.5 V (2/3)), but similar conclusions can be derived from the 
theorems for other bandwidths. 



Proposition 4.1. Assume Bl, let V(x) = (snp x£X ir^ (x))ir~^ (x) for 
C G (0,1) and suppose that (/i^E*) G 9. Then 9 n = (/x n ,E n ) converges in 
probability to 9+ = (fi±, E*). Let ft G (0, 1/2) and /i G . 

1. n -1 / 2 X^fe=i h(Xk) converges weakly to Af(7r(h),a+(h)) as n — > oo, 
w/iere ct 2 (/i) = 7r(/i 2 ) + 2 n(hP^h) and 6»* = E* + el d . 

2. Suppose that A3 /lo/efe and we choose b n oc n _<5 , 5 = |(1 — 0.5 V (2/3)). 
Then T 2 n {h) converges to o~1{h) in LP for p= (2/3) . If we additionally 
suppose that ft G (0, 1/4) and 5 G (2/3, 1/2), or i/iai A4 holds with r > 1 + 
2/3<5 _1 , i/ien £/te convergence ofT^(h) holds almost surely (P) as well. 
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4.2. Choosing the bandwidth b n . Consider Theorem 4.1. Suppose that 
a > 2/3 and that we take b n oc n~ 5 for some 5 £ (0, 1/2]. Then ?7,~ Q 6~ 1+a = 
0{n~ 1 / 2 ). Similarly, n~ l b~ l = 0{n- 1 / 2 ). Thus, the L^-rate of convergence of 
is driven by b n and n, -1+ ( 1 / 2 ) v ( 1 /p)h 71 *' 2 , and we deduce from equating 
these two terms that the optimal choice of b n is given by b n oc n~ s for S = 

1(1 



2 V -). Equation (4.6) then gives that 



fe=l 



In particular, if 4(/3 + 77) < 1 (and a > 2/3), 
5 = 1/3, which leads to 



we can take p = 2 and then 



E l/2, 



k=l 



<Cn- 1 l\ 



The same L 2 -rate of convergence was also derived in [16]. 



Even with b r 



the estimator is still very sensitive to the choice 
of c. Choosing c is a difficult issue where more research is needed. Here, we 
follow a data-driven approach adapted from [1] and [25]. In this approach, 



we take b n 



iV3 ' 



where 



co 



2^ 



1/3 



for some constants cq and m, where pi is the ^th order sample autocorre- 
lation of {h(X n ), n > 0}. [25] suggests choosing m = n 2 / 9 . Our simulation 
results show that small values of cq yield small variances but high biases, 
and inversely for large values of cq. The value c$ also depends on how fast 
the autocorrelation of the process decays. [25] derives some theoretical re- 
sults on the consistency of this procedure in the stationary case. Whether 
these results hold in the present nonstationary case is an open question. 



4.3. Discussion. The above results raise a number of issues. On one 
hand, we note from Theorems 4.1-4.3 that the kernel estimator r 2 (/i) does 
not converge to the asymptotic variance o~ 2 {h), but rather to the asymptotic 
average squared variation T 2 (h). On the other hand, Proposition 3.1 shows 
that although the asymptotic variance o~ 2 (h) controls the fluctuations of 
n -i/2 Y^, =1 h(Xk) as n — > 00, the limiting distribution of n~ x l 2 Y%=\ h{Xk) 
is not the Gaussian N(0,o~ 2 (h)), but instead a mixture of Gaussian distri- 
bution of the form \JV 2 {K)Z . With these conditions, how can one undertake 
a valid error assessment from adaptive MCMC samplers? 

If the adaptation parameter 6 n converges to a deterministic limit 9+, then 
one gets a situation similar to that of Markov chains. This is the ideal case. 
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Indeed, in such cases, T 2 (/i) = a 2 (h), n" 1 / 2 Y^=i h{X k ) converges weakly to 
a random variable J\f(0,a 2 (h)) and the kernel estimator r 2 (/i) converges to 
the asymptotic variance o~ 2 (h), where 




This case includes the adaptive Metropolis algorithm of [17], as discussed in 
Section 4.1. 

However, in some other cases (see, e.g., [2, 7]), what one can actually prove 
is that # n — 7- 0*, where 0* is a discrete random variable with values in a subset 
{ti,T2, . . . ,ttv}, say, of 0. This is typically the case when the adaptation is 
driven by a stochastic approximation n +i = n + 'y n H(0 n , X n+ i), where the 

mean field equation h(0) = f f x H(0, x)ir{dx) = has multiple solutions. 

In these cases, r 2 (/i) clearly provides a poor estimate for c 2 (/i), even 
though it is not hard to see that 

lim E(T 2 (/i)) =E(r 2 (/i)) = a 2 (h). 

n— ^oo 

Furthermore, a confidence interval for Tr(h) becomes difficult to build. In- 
deed, the asymptotic distribution n~ 1 / 2 Y^=i M-^fc) * s a mixture 

,&k), 

fc>i 

where Pk = P(0* = r k ) and a 2 k (h) = vr(/i 2 ) +2'£ j > 1 ir(hP? k h). As a conse- 
quence, a valid confidence interval for ir(h) requires the knowledge of the 
mixing distribution and the asymptotic variances ai(h), which is much 
more than one can obtain from T 2 (/i). It is possible to improve on the es- 
timation of cr 2 (h) by running multiple chains, but this takes away some of 
the advantages of the adaptive MCMC framework. 

In view of this discussion, when Monte Carlo error assessment is impor- 
tant, it seems that the framework of adaptive MCMC is most useful when the 
adaptation mechanism is such that there exists a unique, well-defined, opti- 
mal kernel Pq+ that the algorithm converges to. This is the case, for example, 
with the popular adaptive RWM of [17] discussed above and its extension 
to the MALA (Metropolis adjusted Langevin algorithm; see, e.g., [4]). 

5. Examples. 

5.1. The GARCH(1, 1) model. To illustrate the above results in the Markov 
chain case, we consider the linear GARCH(1, 1) model defined as follows: 
Hq G (0, oo), no ~ A/"(0, ho) and, for n > 1, 

u -h 1/2 e 

h n = u; + /3hn-i +au 2 n _ 1 , 
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where {e n ,n > 0} is i.i.d. W(0, 1) and uj > 0, a > 0, /3 > 0. We assume that 
a,/3 satisfy the following. 

El: There exists v > such that 

(5.1) E[(/3 + aZ 2 ) u ] < 1, Z~tf(0,l). 

It is shown by [21], Theorem 2, that under (5.1), the joint process {(u n ,h n ), 
n > 0} is a phi-irreducible aperiodic Markov chain that admits an invariant 
distribution and is geometrically ergodic with a drift function V(u, h) = 
1 + h u + \u\ 2u . Therefore, Al holds and we can apply Corollary 4.1. We 
write E,r to denote expectation taken under the stationary measure. We are 
interested in the asymptotic variance of the functions h(u) = u 2 . We can 

calculate the exact value. Define p n d = Conv(uQ, u 2 ). As observed by [11] in 
introducing the GARCH models, if (5.1) hold with some v > 2, then 

ail — aB — B 2 ) , i 



Also, 

Var 7r (ii^) 
and we obtain 



3w 2 (l + a + /3) / u 



a - p)(l - /3 2 - 2af3 - 3a 2 ) \1 - a - (3 



a 2 (h)=\ a T n (u 2 )[ 1 + 2 



l-a-B 

For the simulations, we set oj = 1, a = 0.1, f3 = 0.7, which gives & 2 (h) = 
119.1. For these values, (5.1) holds with at least v = 4. We tested the 
Bartlett and the Parzen kernels for which A3 holds. We choose the band- 
width following the approach outlined in Remark 4.2 with cq = 1.5. We run 
the GARCH(1,1) Markov chain for 250,000 iterations and discard the first 
10,000 iterations as burn-in. We compute T^/i) at every 1000 along the 
sample path. The results are plotted in Figure 1. 

5.2. Logistic regression. We also illustrate the results with MCMC and 
adaptive MCMC. We consider the logistic regression model 

yi~B(pp(xi)), i = l,...,n, 

where yi S {0, 1} and pp(x) = e x ^(l + e x @)~ 1 for a parameter /3 G M rf and a 
covariate vector x T G M. d , where x T denotes the transpose of x. B(p) is the 
Bernoulli distribution with parameter p. The log-likelihood is 



£(B\X) = J2y i x i p-log(l + e x ^ 

i=l 
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(a) (b) 




50 100 150 200 50 100 150 200 



Fig. 1. Asymptotic variance estimation for GARCH(1, 1) with uj = 1, a = 0.1, /3 = 0.7 
based on 250,000 iterations, (a) is Bartlett kernel, (b) is Parzen kernel. 



We assume a Gaussian prior distribution vr(/3) oc e 1 /( 2s W for some con- 
stant s > leading to a posterior distribution 

i^lIjote^We- 1 /^ 2 )!"! 2 . 

The RWM algorithm described in Example 2.1 is a possible choice to sam- 
ple from the posterior distribution. We compare a plain RWM with proposal 
density J\f(0,e c I,i) with c = —2.3 and the adaptive RWM described in Al- 
gorithm 2.1 using the family {Pe,9 £ O}, where 6 = @ u (ri) x G+(r2), as 
defined in Example 2.1. It is easy to check that Bl holds. Indeed, we have 

i=l 

and 1 122=l(Vi -Pp( x i))(P' x J)\ - \0\ Td=i \ x i\- We deduce that 
P \ \P\ V^i 



|/?|,Vl0g7T(/?)/ " S 



= 1 



Similarly, 

l_ Y^iP)_ \ < i \£ SUN , 1 

|/3|'|Vlog7r(/3)|/- S 2| Vlog7r(/3) | + | Vlog7r(/3) | IPI 

since |Vlog7r(/3)| ~ s _2 |/?| as \(3\ — > oo. Therefore, Bl holds. If we choose 
r±,T2 large enough so that G 6, then Proposition 4.1 holds and 

applies to any measurable function h such that < C7r_ *(/3|A) for some 

tG [0,1/2). 

As a simulation example, we test the model with the Heart data set which 
has n = 217 cases and d = 14 covariates. The dependent variable is the pres- 
ence or absence of a heart disease and the explanatory variables are relevant 
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100 200 300 400 50 100 150 50 100 150 



Fig. 2. Asymptotic variance estimation for logistic regression modeling of the heart data 
set. Outputs of the coefficient /3' 2 ' are reported, based on 250,000 iterations. 



covariates. More details can be found in [23]. We use Parzen and Bartlett 
kernels with Co = 20 for the Markov chain and cq = 5 for the adaptive chain. 
We run both chains for 250,000 iterations and discard the first 50,000 iter- 
ations as burn- in. The results are plotted in Figure 2 for the coefficient /?2- 
We also report in Table 1 below the resulting confidence for the first four 
coefficients (/3i , . . . , (3^) . 



6. Proofs. This section contains the proofs of the statements from Sec- 
tions 2-3. The remaining proofs are available in the supplementary paper [5]. 
Throughout this section, we shall use C to denote a generic constant whose 
actual value might change from one appearance to the next. On multiple 
occasions, we make use of the Kronecker lemma and the Toeplitz lemma. 
We refer the reader to [18], Section 2.6, for a statement and proof of these 
lemmata. 

We shall routinely use the following martingale inequality. Let {Di, T% % i > 
1} be a martingale difference sequence. For any p > 1, 

( n \ iv( P /2) 



(6.1) 



E 



i=i 



<c^yE iA ( 2 / p )(iAn 



where C can be taken as C = (18pq 1 ^ 2 ) p , p 1 + q 1 = 1. 
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Table 1 

Confidence interval for the first four parameters of the 
model for the heart data set 



Parameters Plain MCMC Adaptive RWM 



/3i [-0.271,-0.239] [-0.272,-0.257] 

#2 [-0.203,-0.158] [-0.182,-0.170] 
Pa [0.744,0.785] [0.776,0.793] 
Pi [0.727,0.756] [0.736,0.750] 



We also notice that for any q £ [l,/3 1 ], Lemma A.l(a)-(b) implies that 
(6.2) supE(|G fljk _ 1 (X fc _i,X Jfc )|9)<oo. 

k>l 

6.1. Proof of Proposition 2.1. Let S n d = YZ=i h 6k-i ( X k)- For # £ ©, we 
define ge(x) = J2j>o he( x )- When hg does not depend on 6, we obtain gg = 
go, as defined in Section 3. Similarly, we define Gg{x,y) = ge(y) = Pggo{x). 
Using the Poisson equation go — Pege = hg , we rewrite S n as S n = M n + R n , 
where 

n 
k=l 

and 

n 

R n = Pe ge (X ) - PeJeM + X> h (X fc ) -^(X*)). 

k=l 

Using Lemma A.l and Al, we easily see that 
\R n \ < C (vP(X ) + vP(X n ) +fyDp(e k , e fc _x) + \hg h - h 6k _ x \ v ,)V^X k )\ . 

For p > 1 such that fip < 1, Efe>i "^((^(^o) + V^{X n )) p ) < oo. This 
is a consequence of Lemma A. 1(a) and the Minkowski inequality. Thus, 
ri _1 x (yP(Xo) -\-V^(X n )) converges almost surely to zero. By (2.7) and the 
Kronecker lemma, the term n" 1 Ylk=i( D /3^k,Gk-i) + \he k -h ek l \ V fi)V^(X k ) 
converges almost surely to zero. We conclude that n~ l R n converges almost 
surely to zero. 

{(M n , J-" n ),n > 1} is a martingale. Again, let p > 1 be such that ftp < 1. 
Equation (6.1) and Lemma A. 1(a) together imply that E(|M n |P) = 0(n lv ( p / 2 )), 
which, combined with Proposition A.l of [5], implies that n _1 M n converges 
almost surely to zero. 
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6.2. Proof of Proposition 3.1. This is a continuation of the previous 
proof. In the present case, hg = h, so we write gg and Gg instead of gg 
and Gg, respectively. Again, let S n = Ylk=i h( x k)- We have S n = M n + R n , 
where M n d = Y2=i G &k-i ( x k-i, x k) and 

\R n \ <C[VP(X ) + vP(X n ) + Y,M k,O k -i)VP(X k) 

k=l 



E(VP(X ) + V?(X n )) is bounded in n, thus n" 1 / 2 ^? (X ) + V?(X n )) con- 
verges in probability to zero. By (3.3) and the Kronecker lemma, the term 
n~ x l 2 Y2k=i ^pi^k-, ®k-i)V^{Xk) converges almost surely to zero. We con- 
clude that n~ l l 2 R n converges in probability to zero. 

{(M n ,.F n ), n > 1} is a martingale. Since j3 < 1/2, (6.2) implies that {(M n , 
J r n), n > 1} is a square integrable martingale and also that we have 

supE( max n" 1 Gl (X^^i, X^)) < oo and 

n>l \l<k<n k ~ 1 J 

(6.3) 

lim max n ^Ga. , (Xu—i, Xu) = (in probability). 

n^ooKKn 

Equations (3.2) and (6.3) imply, by Theorem 3.2 of [18], that n~ l / 2 M n 
converges weakly to a random variable \JY 2 (K)Z, where Z~A/"(0,1), and 
is independent of T 2 (h). 

6.3. Proof of Proposition 3.3. We have 
1 n 
k=l 

1 n 
k=l 

+ ~it[ 1T (dx)(Pg k _ 1 Gg k i (x)-Pg^Gl(x))+ [ Pg t Gl(x)n(dx) 

n k=l jx Jx 

= T (1) +T (2) +T (3) + f p^ G 2 ( x)7r(dx)) 

say. The term T^p is an J^-martingale. Indeed, E(Gg (Xk-i, XkjiTk-i) = 
POk^Gg _ pffc-i), P-a.s. Furthermore, by (6.2), the martingale differences 
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(% k _ i (X k - 1 ,X k )-Po h _ 1 Cfl h _ 1 (X k -.i) are ^-bounded for somep> 1. By [18], 
Theorem 2.22, we conclude that converges in L 1 to zero. 

(2) 

The term T„ converges in probability to zero as a consequence of the law 
of large numbers (Proposition 2.1). Using the definition of Dp and Lemma 
A.l(a)-(b), we can find a constant C such that 

1 7r(dx)(P en Gl(x) - P 6 ^Gl(x)) 
Jx 

<c(D^9 n ,e^ + D 2 ^e n> e^) [ v 2p {x)^(dx), 

Jx 

(3) 

almost surely. It follows that Tn also converges in P-probability to zero. 

6.4. Proof of Proposition 3.4- From the proof of Proposition 2.1 above, 
we have seen that S n = M n + R n , and it is easy to check that E(|i? n | 2 ) = 
( n 2(l-a)) andi by (6 2), E(|M n | 2 ) = 0(n). Therefore, 

| Var(n~ 1/2 5 n ) - n~ 1 E(M 2 )| 

= \2n~ ll &{M n R n ) + n _1 E(i? 2 ) - n" 1 (E( J R n )) 2 | 

= O(n 1/2 " Q )^0 asn^oo 

since a > 1/2. Now, 

n~ 1 E(M 2 ) = E ^n- 1 £ Ge k _ 1 (**-!, X k )^j . 

Again, from (6.2), the sequence n~ z2k=i ^6 1 (Xk-i, X k ) is uniformly in- 
tegrable which, combined with (3.2) and Lebesgue's dominated convergence 
theorem, implies that re _1 E(M 2 ) converges to E(r 2 (/i)). 

APPENDIX A: SOME USEFUL CONSEQUENCES OF Al 
Lemma A.l. Assume that {Pe,0 £ 0} satisfies Al. 

(a) There exists a finite constant C such that 
(A.l) supE(U(A„)) < C. 

n>0 

(b) Let j3 G (0, 1] and {he € £ v p , 9 € 0} be such that ir(hg) = 0, sup0 ge 

|/j#|ys < oo. The function gg d = ^2j>oPghg(x) is then well defined, \ge\yp < 
C\hg\yi3, where the constant C does not depend on {hg S Cyp,9 £ ©}. More- 
over, we can take C such that for any 9,9' € O, 

(A.2) \g e - ggi\yn < Cswp\ho\ V f>{Dp{0,0') + \h e - h e <\ v p). 

<?ee 
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(c) Assume A2. Let (3 £ (0, 1 — rj) and h G Cyp be such that vr(/i) = 0. 
Define S n (j) = X^=?+i h(Xe). Letp E (1, (/S + r/)" 1 ). There then exists a fini- 
te constant C that does not depend on n,j,9 or h such that 

ms n u)\ p )<c\h\ vf ,n iw ^ 2 \ 

Proof. Parts (a) and (b) are standard results (see, e.g., [2]). To prove (c), 
we use the Poisson equation (3.1) to write 

j+n 

Sn(j)= £ Ge^Xi-uXd+Pe^XJ-Pe^ge^Xj+n) 
e=j+i 

j+n 
i=j+l 

By Al and part (a), we have 

supsupE[|P e , 5e .pf,-) - P ej+n ge j+n {X j+n )\ p ] < C\h\ v p. 

n>l j>0 

By Burkholder's inequality and some standard inequalities, 



E 



j+n 

i=j+i 



j+n 



lV(p/2) 



U=j+i 
< C\h\ v ,n lv ^ 2 \ 
Part (b) and A2 together give 



E 



j+n 

9e e _i(Xt) -ge t {Xi) 

e=j+i 



<C\h\ v gE 



< C\h\ vP E 



j+n 



£ Dp (0g-i,6g) (X£ ) 



l=j+l 
j+n 



]T ik+eVP +r >(X e 

\£=j+l 



(j+n 
e=j+i 



and, since 7 n = 0(n ' ), we are done. □ 
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SUPPLEMENTARY MATERIAL 

Supplement to "Kernel estimators of asymptotic variance for adaptive 
Markov chain Monte Carlo" (DOI: 10.1214/10- AOS828SUPP; .pdf). The 
proofs of Theorems 4.1-4.3 require some technical and lengthy arguments 
that we develop in this supplement. 
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